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Abstract. We add updates to the query language PrQL, designed for 
inspecting machine representations of proofs. PrQL natively supports 
hiproofs that express proof structure using hierarchically nested labelled 
trees, which we claim is a natural way of taming the complexity of huge 
proofs. Query-driven updates allow us to change this structure, in par- 
ticular, to transform proofs produced by interactive theorem provers into 
forms that are easier for humans to understand, or that could be con- 
sumed by other tools. In this paper we motivate and define basic up- 
date operations, using an abstract denotational semantics of hiproofs 
and queries. This extends our previous semantics for queries based on 
syntactic tree representations. We define update operations that add and 
remove sub-proofs or manipulate the hierarchy to group and ungroup 
nodes. We show that these basic operations are well-behaved and hence 
can form a sound core for a hierarchical transformation language. Our 
study here is firmly in language design and semantics; implementation 
strategies and study of sub-languages of our query language with good 
complexity will come later. 


1 Introduction 

We are interested in ways to exploit machine representations of proofs con- 
structed by interactive or automated theorem provers. These proof representa- 
tions are produced so that they can be independently checked or imported into 
other systems. We believe that they can be exploited beyond this. For example, 
system inputs such as proof scripts are rarely given at the lowest level of detail, 
even with interactive theorem provers. Therefore it can be useful for proof de- 
velopers to understand how the system has found a proof: which inference rules 
have been used, which axioms, which instantiations for existential variables, and 
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so on. More complex questions are also interesting. For example, whether a proof 
contains unnecessary detours or replicated sub-proofs. 

To this end, we recently introduced PrQL [2], a proof query language which 
treats a large formal proof as an object that can be examined in a systematic 
way. We are currently developing practical prototypes to experiment with proof 
queries, so far based on exporting from Isabelle [2] and HOL Light [21]. But 
it is clear already that as well asking questions, we also want to be able to 
transform proofs to alter their structure in various ways. This may be used to 
aid understanding (human or machine), by hiding certain kinds of details. Or it 
could be used for optimisation or adaptation, to change proofs to more efficient 
forms, or for consumption by different systems such as proof commentary tools 
or machine learning tools. This paper is a study of a rigorous foundation for 
such transformations, introducing update extensions for PrQL. 

To study the foundations of updates, we need to have the right data model for 
hiproofs and define operations that preserve the hiproof structure. Some trans- 
formations may also preserve theoremhood of proved statements. This is why we 
design our own query and transformation language, rather than immediately en- 
coding our concepts into a more general graph or tree model (such as XML) with 
an existing query and transformation language (such as XQuery Update [10] or 
XDuce [20]) that could make arbitrary dissections and rearrangement. 

When it comes to implementing our query and update language, it is obvi- 
ously desirable to reuse existing systems which have looser semantics but opti- 
mised implementations for query language fragments in good complexity classes. 
We may consider for example, graph databases, other tools in the “NoSQL” 
family or perhaps even SPARQL. We are conducting some early experiments in 
parallel with the work described here. 


Contributions and paper outline. This paper contributes towards generic foun- 
dational aspects of theorem proving systems, in particular, the novel aspects of 
querying and transforming the proof objects which can be recorded by proof 
tools. Moreover, we contribute to the study of a structured representation for 
these objects. Sect. 2 introduces the idea of proof transformations that we are 
studying, with some informal examples and motivations. Sect. 3 recaps the tech- 
nical background of hiproofs and PrQL. Sect. 4 introduces a revised denota- 
tional semantics for hiproofs; this extends previous work, connecting the syn- 
tactic strand of [3] with the previous denotational semantics of [14]. The new 
extensions add explicit orderings among subtrees and the ability to model open, 
i.e. , incomplete, proofs. Sect. 5 gives a new denotational semantics to our query 
language. This interpretation provides two advances: (1) the ability to return 
locations in the hiproof where a query is satisfied, and (2) a close connection to 
a graph model that we can use to encode hiproofs. Sect. 6 builds on top of this 
to define our four kinds of update operations. We show that these operations 
are well-behaved and preserve proofs in certain senses. Finally, we give a more 
detailed comparison to related work in the concluding Sect. 7. 
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2 Querying and transforming hierarchical trees 

We start from hiproofs [3, 14], which provide an abstract, generic notion of proof 
tree with hierarchical structure. Hiproofs are composed from atomic rules of 
inference from an unspecified underlying logic, but additionally provide a notion 
of hierarchy , by allowing labelling and nesting of subtrees inside boxes. This 
succinct notion of structuring in a proof can be used, for example, for noting 
where a lemma was applied, or where a particular tactic or external proof tool 
produced a subtree. The hierarchical structure of hiproofs and its interaction 
with the proof-tree is more complex than the straightforward tree structure, in 
particular because hiproofs allow nesting of partially completed proofs. 



(a) 

Fig. 1 . Different hiproof structures on the same underlying proof 


The picture shown in Fig. 1(a) is an example hiproof, shown at a certain 
level of abstraction. It corresponds with an ordinary (but upside-down) natural 
deduction style proof tree: the theorem being proved, Wx.By.y < x + 1 is shown 
at the top, and then the proof outline shows how the proof is achieved by de- 
composing the goal theorem into pieces. The labelled boxes correspond to tactics 
which have been applied to do this. Notice how the Induction box encapsulates 
an incomplete proof; it has the dangling edge which is passed into the Solver box. 
We suppose that boxes such as Base may contain further details, perhaps right 
down to atomic inferences in the underlying system; the diagram only hints at 
the full hiproof. Fig. 1(a) shows the statements being proved along edges. In a 
visualisation tool (such as the web-based HipCam [21]) the goals may be shown 
in pop-ups so as not to clutter the display, and boxes such as Base can be opened 
and closed dynamically. 
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Variations of hierarchy. Further right in Fig. 1 we see some alternative structur- 
ing of this simple inductive proof. Fig. 1(b) shows the complete step case being 
enclosed by the induction box; whereas Fig. 1(c) shows just the induction rule 
itself being boxed. These pictures motivate our main kind of desirable trans- 
formations: to alter and introduce hierarchical structure. For example, when an 
inductive proof appears in the proof tree, we might like to give it the uniform 
structure on the left so it can be easily picked apart. However, the proofs which 
arise by a naive labelling of tactics in HOL Light without hiproof adaptation [21], 
for example, have the form in Fig. 1(c). 

Basic transformations. Generally, the life cycle of data management is cap- 
tured by functions to create , read, update , and delete. We already have mecha- 
nisms to create proof objects: abstractly, via the syntax for hiproofs reviewed in 
Sect. 3, and in practice by functions for exporting proof objects from systems 
like Isabelle [2] and HOL Light [21]. To inspect proof objects, PrQL provides a 
language of structured queries, reviewed further below. To manipulate existing 
hiproofs, we need to add update and delete operations. But we want to do this 
in a way that respects the proof structure, rather than as arbitrary edits to a 
tree or graph. This motivates the following four types of operation. 

Introduce hierarchy is used moving from Fig. 1(c) to Fig. 1(a): we introduce 
a nested hiproof called Base for the two steps Exlntro and Trivial, which hides 
the detail. We also push in the children of Rule into the Induction box. 
Remove hierarchy is the opposite transformation. Visualisation tools perform 
this reversibly under user control, but here we want to permanently trans- 
form the underlying structure by pulling out individual pieces, such as when 
moving from Fig 1(b) to Fig 1(a). 

Remove subproof deletes part of a hiproof. This is a radical operation, and 
will change what is being proved, popping out an unproved subgoal to the 
top level. For example, if we remove the Solver tactic in Fig. 1(a), the proof 
is left unfinished with the subgoal Y < k + 1 ==>■ Y < k + 2 remaining. 
Complete subproof is the inverse operation, and grafts on a new subtree. This 
can resolve a previously unproved subgoal, or generate new subgoals. 


2.1 Finding somewhere to transform 

First, to apply a transformation, we need to know where in a target hiproof it 
should be applied. A natural way to End a transformation point is to search 
for a node satisfying some properties: this is where queries enter the picture. 
(Similarly, update languages that have been defined elsewhere for semistructured 
data and graphs also use queries to position updates; see Sect. 7.) 

We have already designed PrQL, a query language for hiproofs, so it is nat- 
ural to reuse it. PrQL is a structured query language which combines property 
queries (that look at local properties on nodes) with structuring operations (that 
combine queries across connected nodes, decomposing the tree). These can be 
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defined with recursion and logical connectives, giving a powerful language that 
can encode search in queries. For example, the PrQL query 

somewhere (atomic Exlntro then atomic Trivial) 

is satisfied by the hiproof in Fig. 1(c). The atomic operator examines a label 
on a bottom-most nested node. The then operator decomposes the target graph 
across the proof tree sequence. Similarly, we can decompose sibling hiproofs 
with beside and nested hiproofs with inside, building up patterns. Patterns 
may contain match variables that get instantiated with names of rules or box 
labels. Using recursion we can define operators like somewhere (finds a match 
in any subtree) and nearby (finds a match in any subtree at the same nesting 
depth). See Sect. 3.1 for more details of PrQL. 

However, so far there is not yet a notion of where a query is satisfied; we do 
not have a way to describe where Exlntro or Trivial rules were actually found. 
To pick out specific nodes in a hiproof, we extend the query language to return 
positions: a new type of match variable standing for a (sub)hiproof where a query 
is satisfied. We add the new query term “at X ” which matches X against the 
“currently examined” node in the tree. So 

somewhere inside Induction nearby (at X A atomic Trivial) 

returns locations X where Trivial appears immediately inside an Induction box. 

Unlike labels for boxes and atomic rule names, nodes in our proof trees are 
abstract: we do not need user-level syntax for writing their identities. So at can 
only locate a position by properties; it cannot pick out a specific node concretely. 
But the query language is precise enough that, for any specific node in the tree, 
there is a query which picks out that node uniquely (see Prop. 1 in Sect. 5). 

2.2 Updating proofs 

Now we have a way to specify transformation points, we can show how our up- 
date operations are written. Several language design choices are possible. We 
have followed an SQL-like paradigm, matching positions then using one-shot 
operations which can update a large proof in-place, based on the selected posi- 
tions. A more ambitious choice would be to design a hybrid query and update 
language, with looping and branching to build up complex transformations. But 
we first want to understand the update combinators that are common to both. 

As a first example, to turn Fig. 1(c) into Fig. 1(b) we use a transformation 
which adds a box around a given subtree, called box: 

box X to Y Z as Induction where (1) 

(at X A atomic Rule) then (seq Y beside seq Z) 

where the recursive query seq picks out a sequence ending at X: 

dsf 

seq X = pQ. * then Q V (at X A -i(* then *)) 
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Besides adding boxes, we can remove them with unbox: 

unbox X where at X A inside Solver 

which removes the Solver box around the result of an automatic tactic. Instead 
we could rename it, simply writing: rename Solver as Auto. 

So far, these operations have not changed what is proved in the hiproof. 
Other updates change the underlying proof tree, but maintain its validity. For 
example, maybe we are not interested in a particular subtree of a proof 

deletetree X where inside Meson at X 

then this removes the subtree generated by an automatic procedure, just leav- 
ing the name of the procedure. In the hiproof structure, we do not forget that 
something is unproved; the subtree leaves a dangling edge. 

Dually, we can fill in a proof for such a dangling edge; this is a refinement 
operation in the sense that it extends the proof: 

refine X with s where at X A unproved 7 

Here, s is a literal term in the syntax for hiproofs, which proves the goal 7 . 

Finally, it can be useful to use a more general replacement transformation 
which is defined using deletetree then refine. For example, to find useless 
detours in a proof tree, we use the query: 

d e f 

useless X Y = (at X A goal G ) then nearby (at Y A goal G) 

this identifies a path from X to Y where we hit the same goal G = 7 . It might 
even be a tactic which is worse than useless, in that it has transformed a goal 7 
into several more goals to prove including 7 again. Now the replace update 

replace X by Y where (useless X Y) 

removes this detour. 

3 Syntactic Hiproofs and PrQL queries 

This section introduces previous material as background. We are as concise as 
possible and refer the reader to previous papers for more details [2, 3, 14]. 

Hiproofs add structure to an underlying derivation system , a simple kind of 
logical framework. A derivation system is given by a set Q of goals (intuitively: 
possibly provable sequents or judgements), ranged over by 7 , and a set of atomic 
inference rules ranged over by a. Atomic rules are composed to give hiproofs, 
which have a functional reading: a hiproof maps a finite list of input goals g\ = 
[ 71 : ■ ■ • ,7n] t0 a list of output Subgoals fl2 = [7l: • • ■ :7m]- 
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Informally, we draw hiproofs as inverted trees with a nested structure. For- 
mally, a hiproof is given by two forests on the same set of nodes, as explained in 
Sect. 4. Syntactically, a hiproof can be written as a term: 

s ::= a | id atomic and identity 

[l] s | si ; sg labelling and sequencing 

| () | si ® S2 empty and tensor (juxtaposition) 

where l 6 £, an arbitrary set of names and a 6 A for some special subset 
A C C. We think of labels as standing for names of tactics or proof rules, or 
atomic steps; they have no semantic content. For example, the proofs in Fig. 1 
are written syntactically as 

([Induction] Rule ; Base (g> Step) ; [Solver] Rewrite (2) 

[Induction] Rule ; Base ® (Step ; [Solver] Rewrite) (3) 

Rule ; (Exlntro ; Trivial) ® (Apply I H ; Rewrite) (4) 


3.1 Structured queries in PrQL 

The definition of PrQL starts with matches built from wildcards and match 
variables, constants (atoms, sets and predicates) and negation (to construct the 
complement of a match). Let Var jv be a set of schematic variables standing for 
names, ranged over by N in general and A when we suggest an atomic rule name 
or L a label name. Let Vara be a set of variables standing for lists of goals. The 
name matches and goal matches are given by: 

nm ::= a \ l | • | £ | N j -mm gm ::= 7 | ip \ G j -1 gm 

where £ stands for a logic-dependent predicate on names, and ip stands for a 
logic-dependent predicate on goals used to check some structural property of 
the goal term. For example we might have a predicate that checks whether a 
goal 7 is in the form of a horn clause, when (phom( 7) holds. The special name • 
is used to label unproved goals; the name * = -i« serves as a wildcard. 

We use matches to build up queries, q, as below. The extension to PrQL to 
locate vertices uses a set of match variables Varjj, ranged over by X. 


at X 

atomic nm 
inside nm q 
qi then q 2 
qi beside q 2 
goal gm 

qi A g 2 | qi V q 2 | -ig 
hQ-q 


anything non-empty 
matches at node X 
atomic rule match 

q satisfied inside box with label matching nm 
qi and q 2 satisfied by successive nodes 
qi and q 2 satisfied by adjacent nodes 
proved goal matches gm 
compound queries 
recursive query 
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Queries are built from schematic hiproof terms. They are posed against an im- 
plicit hiproof subject, instantiating the match variables and testing goals. Com- 
pound queries are built using logical connectives and recursion. This core lan- 
guage allows many useful derived forms, like the search operator somewhere. 
We can examine gaps in proofs too; to assert that the hiproof has 7 as an un- 
solved goal we write: 

unproved 7 goal 7 A atomic • 

This works because we model ‘dangling’ edges as empty boxes labelled with •. 

4 Denotational hiproofs 

A hiproof consists of two forests on the same set of nodes, with a distinguished 
root, satisfying some conditions [14]. To relate to a derivation system (where 
premises of inference rules have an ordering), we add a left-to- right ordering 
among siblings. To relate to the syntax, we use a more general forest notion 
first, then restrict to hiproofs. To model incomplete (partial) proofs, we add 
nodes corresponding to unproved goals. Lastly, we extend labelling to attach to 
each node the goal it validates, as shown on edges entering nodes in Fig. 1(a). 

Given a forest F defined by a relation R on a set of vertices, we write 
siblings R {v,v') if v and v 1 are children of the same f?-parent. Given a vertex 
v, we write isroot R (v ) for the assertion that v is a root wrt R, i.e. , V 1 /. 1 / R 
v => v = v' , and isleaf R (v) for the dual, i.e., Vv'.v R v' => v = v' . 

Definition 1 (Ordered Hiforest). An ordered hiforest H = (V, L, <;, — » s , <) 
consists of a finite set of vertices V with a labelling function L : V — > (£u{»}) xQ 
and three relations on V x V . The relations are an inclusion order <\ (which 
captures the nesting of vertices; >i is proper containment), a sequencing relation 
— >- s (which captures the functional composition of nodes) and a child order <. 
These are subject to the following conditions: 

0. (V, <i) and ( V , —>•„) each form forests; <; and < are partial orders. 

1. arrows target outer nodes: v— > s w and v' >j w => v' >i v. 

2. arrows emanate from inner nodes: v^ s w and v' <\ v => v = v' . 

3. inclusion & sequence are mutually exclusive: v <\ w and v^ s *w =7 v = w. 

4 . boxes have unique roots: 

siblings A isroot^fiv) A isroot^. B (v') => v = v' . 

5. children or top-level roots are totally ordered: 

siblings (v, v') V (isrootyfiv) A isroot >i (v')) =7 v < v' V v' < v. 

6. only leaves (wrt. sequencing and inclusion) may have • label: 

L[y) = (•, 7 ) => leaf^ sU> .(v). 

Each node in a hiforest is given a name and a goal. The goal is the theorem 
proved at that node. The unproved parts are the ‘dangling’ holes labelled by 
• . An ordered hiforest proves a sequence of top-level goals, whereas a hiproof 
proves just one. 
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Definition 2 (Ordered Hiproof). An ordered hiproof is an ordered hiforest 
which satisfies the additional constraint: 

7. Top-level roots are unique: isroot^ ts \j >i (v) A isroot^ s u> i (t' / ) =4> v = v' . 

We are mainly interested in valid hiproofs, which are those corresponding to 
a proof in the underlying derivation system. 

Definition 3 (Validity). A hiforest H is valid if it corresponds to a sequence 
of (possibly incomplete) proof trees in the underlying derivation system; we write 
H |= gi — y g 2 if this holds and where g\ is the list of goals on the outermost 
roots of H , and g 2 is the list of unproved goals on the holes, as ordered by 
extending < to the leaves of the tree. 

A map between two hiforests is a map between the vertices and the labels 
which preserves the orderings and the labelling. We say a hiforest Hi refines 
to a hiforest H 2 , Hi C H 2 , if there is an inclusion from Hi to H 2 which also 
preserves the roots wrt >i. 

We now define some operations on the two dimensions of hiforests which will 
form the semantic foundations of our transformations. For brevity, definitions 
are given informally here, and made precise in the appendix. Given two hiforests 
Hi and H 2 such that Hi |= gi — > g and H 2 |= g — > g 2 , we define a com- 
position operation graft(Hi, H- 2 ) that ‘grafts’ the roots of H 2 into the dangling 
goals of Hi, such that graft (Hi, H 2 ) |= gi — > g 2 ; it can be characterised at the 
smallest hiforest H 3 which refines Hi, Hi C H 3 , for which there is a (necessarily 
injective) map a : H 2 — > H 3 . This is an instance of a more general opera- 
tion graft(Hi, H 2 , vi , . . . , u m ) which grafts the m roots of H 2 into the specified 
danglers Vi, . . . of Hi, where Hi may contain more than m danglers. 

Given a vertex v 6E V in hiforest H , we define cover(v, H) as the hiproof 
containing the set of vertices in H reachable from v by >j or — > B , includ- 
ing v itself. If H \= gi — > g 2 then cover(v, H) |= — > g v where 

L(v) = (l, "tv) and g 2 = g' 2 A g v A g 2 (with A denoting list concatenation). The op- 
eration chop(v, H) removes exactly these vertices, replacing them with a hole. So 
chop(v, H ) |= gi — > g 3 where g 3 is the list g' 2 A [7u] A g 2 - Together, these oper- 
ations are inverse to grafting, i.e. graft(chop(v, H), cover(v, H),v) = H (modulo 
some technical restrictions). The final operations are box(l,H) and unbox(H) 
which add and remove ‘boxes’ around the roots of H , where a box is a node 
(labelled l) including all the other nodes (below that root). These are inverse as 
well: unbox (box (l, H)) = H. These two operations preserve validity and input 
and output goal lists. 

5 Semantics for queries 

The query semantics we gave in [2] was based on querying syntax models di- 
rectly. Since hiproofs are constructed syntactically, this is in a sense the most 
direct approach. However, syntactic representations are not canonical, because 
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a particular underlying tree structure can be denoted by many terms in the 
syntax. E.g., the proof in Fig. 1(c) can be expressed as in (4) or as 

Rule ; (Exlntro 0 Apply I H ) ; (Trivial 0 Rewrite) 


For the definition of boolean satisfaction of a query given in [2] , this is not prob- 
lematic as we can close under the syntactic equivalence given by the algebraic 
structure of hiproofs. But to define updates it is more delicate, since we need a 
firm notion of focus in the hiproof to anchor changes; e.g., example (1) does not 
work with the syntactic form above. We could use normal forms for syntactic 
terms, but the denotational model is more direct and also fits well with parallel 
work on implementation using graph databases, building on [21], 

The definition of query satisfaction in the denotational semantics uses a sub- 
stitution to instantiate variables: a : ( Var jv — *• £) W ( Vara — G) tfcl ( Varj] — *■ V), 
where V is the set of vertices of the hiproof being queried. The base case for 
query satisfaction is for names and goals, treated very similarly: 


n \= a n' iff n = n' 

£ \= a n iff £(n) 

N \=„ n iff cr(N) = n 
(-,7V) \= a n iff — >(2V \= a n) 


7 \=” l' hf 7 = l' 
il> |=<t 7 iff V>(7) 

G K 7 iff °{G) = 7 
(-G) K 7 iff -(G K 7) 


For a relation R and distinct o, b, we write a R 1 b if a R b and there is no 
intermediate c such that a R c and c R b. 


Definition 4 (Query satisfaction). Let H be an ordered hiforest with vertices 
V and q a query. Satisfaction of q for H at a vertex v € V wrt a substitution a 
is defined as the least relation v \= a q satisfying the following clauses: 


v 

V 

V 

V 

V 

V 

V 

V 

V 

V 


K * 

1 = a at X 
\= a goal gm 
|= CT inside nm q 

\= a qi beside q 2 
\= a qi then q 2 
|— - g qi a 
\=a qi v q 2 
Ho- ~ '9 
!=<t l^Q.q 


always 

iff &{X) = v 

iff gm \= a 7 where L(y) = (l,j) for some l 
when nm \= a l where (u) = (l, 7) for some 7 
andVw.w <, v => w \= a q 
when v \= a qi and 3 w.v w with w \= a 92 
when v \= a q\ and 3 w.v^lw with w \= a q 2 
when v \= a q\ and v \= a q 2 
when v \= a qi or v \= a q 2 
when ->(v \= a 9) 
when v \= a q[/j.Q.q/Q\ 


A query q is satisfied by a substitution cr on a hiforest H , written H 1=^ q, if it is 
satisfied on each outermost root vertex of H, i.e., Vv.isroot 3 u>i(*0 =>v\ = a q. 

Def. 4 works by navigating in a fixed hiproof h to find satisfying vertices v. 
Because a vertex determines a sub-hiproof, this is equivalent to a structural 
definition as given in [2], which works by decomposing the subject hiproof during 
navigation, defining a relation s \= a q. Note that in this model atomic is 
defineable as an empty box: atomic nm = inside nm (-,*). 
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Definition 5 (Query interpretation). Let H be an ordered hiforest and q 
a query. Then we define the interpretation of q in H as the set of satisfying 
substitutions: [g]]# = { a \ H |= CT g}. 

Our language is expressive but queries can be expensive. In [2] we gave a naive 
algorithm for [g] using unification to instantiate variables, which is exponential 
in the number of match variables. Recursion and match variable unification 
unavoidably affect the data complexity of our queries (see basic results e.g., [1, 
12, 18]). For large proofs, we would want a fragment that is more feasible but 
captures most desirable examples. The following proposition is the denotational 
counterpart of a similar proposition in [2]. 

Proposition 1. Given a hiproof H , one of its vertices v and a variable X , there 
is a query Q(v,X) which locates v at X, i.e., [Q(u, X)J# = {er} with cr(X) = v. 


6 Transformations and their semantics 


We now introduce the core update operations formally. Note that we do not 
want to allow arbitrary “tree surgery” of the hiproof structure; we want update 
operations to preserve semantic validity. Updates have the syntax: 


u ::= box X r to Xi . . . X n as l 

unbox X 
rename X as l 
refine X with s 
deletetree X 
replace X by Y 


add nested box around X r . . . X\ . . . X n 

unfold nested box at X 

change label on box at X 

add a new sub-hiproof at A' 

delete subtree at X 

replace subtree at X by that at Y 


The box operation is the most interesting. It introduces a nested box, whose 
contents are nodes in the partial subtree with X r as root and X\ . . . X n as leaves. 
This allows us to gather to an arbitrary depth, using a query to select either end 
of the path; this is useful to package up repeated applications of rules. The other 
update operations are straightforward to understand. An update is applied by 
combining with a query to instantiate node variables in a hiproof, written as 
update u q. This matches q to the root of the hiproof; a more common pattern 
is to search the hiproof for matches, as seen in the examples in Sect. 2.2. This 
is written and defined as u where q = update u (somewhere q). 


6.1 Interpretation of transformations 

We can specify positions in a hiproof, but we still need to solve a well-known 
problem with tree and graph updates. Suppose a query picks out several nodes 
and a transformation changes the structure; then simultaneous updates may 
overlap. The result may be ill-defined, or may depend on the execution order. 
The semantics as given here is based on single- valued answers to queries; where 
a query has several answers, there may be several update results, representing 
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applying the operation to different positions in the tree. To have a global effect, 
the update results may be merged if they do not conflict, or we may simply re- 
peatedly apply a query and update. We are not yet investigating implementation 
in detail, so making any such choices for PrQL could be premature; we prefer 
to first pin down an accurate semantics. Later on, we plan to extend the lan- 
guage to allow more efficient constructs, avoiding multiple passes and using type 
systems to ensure safety; we will relate back to the present, intended semantics. 
To interpret updates, we use the operations in Sect. 4 and extra definitions: 

(i) A combinator to transform a subforest of H with a function /: 

at(H, v, f) = graft (chop (H , v), f(cover(H, v)),v) 

(ii) The box operator specialised to box only down to vertices iq, . . . , v n : 

addbox(H, l,v 1, • ■ • , v n ) = graft( box(l, chop n (H , iq, . . . , v n )), 

cover n (H, Vi,..., v n ), v lt ...,v„) 

where chop n (H, v \, . . . , v n ) and cover n (H, v \, . . . , v n ) are the obvious gen- 
eralisations of chop and cover to n arguments. 

(iii) To add or remove boxes at the sub forest given by ty: 

addboxat(H , v r , Vi, . . . , v n ) = at(H, v ri \H.addbox(H, l, iq, . . . , v n )) 
unboxat(H,v r ) = at(H,v r , unbox) 

(iv) To change the label of a vertex: let H = (V, L, <,, — » s , <),«ey and l 6 L, 
then L' is defined as L'(v') = (l,j) for v' = v, where L(v) = (l 1 , 7) and 
L'(v') = L(y) otherwise. Then relabel(l, H,v) = (V, L’, <j, — > s , <). 

Definition 6 (Interpretation of transformations). Let H be a hiproof and 
q[X 1 . . . X n ] a query with match variables instantiated by a. The meaning of an 
update wrt a is a partial function, defined when the RHS is defined: 

[box X r to Xi . . . X n as = addboxat(H, l , a(X r ), cr(Xi ), . . . , a(X n )) 
[unbox XJh = unboxat(H,a(X)) 

[rename X as l\ a H = relabel (H, a (X),l) 

[refine X with s]^ = graft(H, [s],<r(X)) 

[deletetree X = chop(H,a(X)) 

[replace X ± by X 2 \h = graft (chop (H, a (X^), cover(H,a{X 2 )),a(X 1 )) 

[update u qjH = { H h I a e Wh and [wfff is defined } 

Def. 6 gives a non-deterministic semantics; the result may be empty (if opera- 
tions are undefined) or there may be several results (for different instantiations). 
We do not say anything here about how to combine several results into one, 
as this may depend on the implementation; as hinted above, an implementa- 
tion may encode our core operations using a more general update language. In 
this setting, a better alternative would be to give criteria which guarantee a 
deterministic result. For the same reason, we do not yet investigate complexity 
results. 
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7 Related work and conclusions 

This paper introduced an update extension of PrQL, a query language for 
hiproofs. We interpret queries and transformations using denotational seman- 
tics of hiproofs, which are graph-like structures subject to well-formedness con- 
straints. We showed that the basic operations are enough to capture desirable 
transformations, and that they preserve well-formedness and the connection to 
underlying proof trees. 

Connections in theorem proving. As larger proof developments are being con- 
structed, people are starting to explore ways to investigate them. Besides PrQL, a 
query language has been proposed for OmDoc proofs [22], The Proviola tool [24] 
provides another means for proof understanding, by recording the output is- 
sued by an interactive proof during its execution development; impressively, it 
has been used to annotate source code of large proofs in both Coq (the Feit- 
Thompson proof [17]) and HOL Light (Hales’s Flyspeck proof [23]). However, 
Proviola sheds no light on a proof that proceeds in a single tactic execution 
step. A liiproof-based tool would allow more dynamic exploration, by zooming 
into proof objects to look at the fine detail — although the practical details of 
managing such large proof objects will be challenging. Other researchers have 
used proof as the subject for search and machine learning (e.g., [19, 25]). Again 
this work might be usefully adapted to proof trees. 

Conversely, we hope that our work can be adapted to transforming proof 
scripts. Rather than altering the extracted proof trees for HOL Light, we might 
want to impose the structural changes on the input proofs themselves, where 
possible. Work has been started on tools and foundations for proof refactoring 
towards this [5, 15, 27], but it is challenging: it requires understanding the mean- 
ing of input proof scripts, and how to transform them. By contrast, it is much 
easier to manipulate recorded output proof structures. 

Update languages for structured data. There is a large body of work from the 
last decade on query and update languages for general forms of structured data. 
PrQL was inspired by, among others, UnQL [7] and Graph Logic [9]; the latter 
was extended to Context Logic to consider updates [8] and the former extended 
to a language of functional transformations [11], in the setting of XML Update. 
The approach taken by the W3C to extend XQuery [10] has a more SQL-like 
flavour, similar to our approach. 

Transformations and hierarchy. To study PrQL updates and extensions further, 
fundamental results on tree queries [18], transformation operations [16] and com- 
plexity [4] should be possible to adapt. However, without restricting our language 
we are unlikely to improve on earlier complexity results [2], so instead we want 
to focus on translation into an efficient underlying XML or graph-based system. 
Having worked out the language design and semantics, we need to use the right 
level of abstraction before translation, taking hierarchy as a native construct. 
Hierarchical graphs have recently been studied in another setting, for structuring 
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safety cases in a hierarchical way, providing a tool that performs transformations 
like those studied here [13]. Related ideas for managing hierarchy in understand- 
ing provenance have recently been proposed [6]. 

Future and ongoing work. Several extensions to our update language are desir- 
able; at the least, to add constructs for composing and iterating transformations. 
Before pursuing that, we want to extend our practical experiments to transfor- 
mations. Taking the implementation of hiproofs in HOL Light [21], we can output 
them in a form suitable for a graph database system such as Neo4j [26], which 
can store and process very large structures on disk. Some of our queries and 
transformations can be captured in Neo4j’s query and update language Cipher , 
although it remains to investigate how efficient the encoding is; alongside prac- 
tical experiments, we need to give a further theoretical analysis. 

Acknowledgements. The authors thank James Cheney and Domagoj Vrgoc for 
helpful discussions. 
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A Additional technical details 


Definition 7 (Grafting). Let H = (V, L, <i, — l s , <) be a valid hiforest with 
H |= gi — > g. Let vi,...,v n be distinct vertices in V, with L(vi) = (»,7i) 
(and hence n < length(g)). Let H' = (V 7 , L' , <[, — i s ', <') be another hiforest 
with H \= g' — > (?2j so it has n overall roots {u r i . . . v rn } 6 V' ordered by <' 
with L(y r i ) = (/i,7i). Suppose (wlog) V fl V' = 0 . 

Then we can define a new hiforest by 


graft(H, H',v 1 : ...,v n ) 


(V - {n . . . v n } U V', L\ v _ {vi ,„ Vn} U L', <" 


<’ 


The relations <", —is” and are defined by: 


v <” w iff either < 


v— > s "w iff either < 


v <” w iff either < 


v <\ w A w $ («i . . . v n ) 

V <{ V r i A Vi <\W 
v <[ w 

v — y§w A w ^ {Yi . . . v n } 

V^ s Vi A Vri^-s'w 
V—> s 'w 

V < W A W 0 {Yl . . . Vn) 

(v < Vi A Vri = w) V (v = Vri A Vi <' w) 

V <7 W 


If H has exactly n holes v±, . . . ,v n (i.e., g = [71, . . . , 7„ ] and L(vi) = (•, 7,) ), 
then we write graft(H, H') as an abbreviation. 


Definition 8 (Cover). Given a hiforest H = (V, L, <;, — i s , <) and vertex v £ 
V , we define the cover ofv as all nodes below or inside v byV' = cover u> i (r), 
where the cover of a relation R is defined as cover r(x) = {y \ x R* y) . and the 
labellings and orderings restricted accordingly: 


cover(H, v) = ( V ', L\v ,<\\v’xv i^sW'xv ^ W'xv )■ 


When defining the chopping operation, we do not take out the node v, but 
replace its label with • to make it a dangler: 
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Definition 9 (Chopping). Given a hiforest (or hiproof) H = (V, L, <;, — » s , <) 
and vertex v, then we define a new hiforest without nodes below or inside v by 
setting V' = (V — cover u> i (f)) U {u} and 

chop(H, v) = (V', L\ v — cover —>. a u>^ U {v h-> (•, 7) I L(v) = (l, 7)}, 
<i|v'xV'i —■ ►s|v'xV'> < W'xv) 

We can generalise chop and cover to n arguments. Chopping n vertices re- 
moves them sequentially from H, whereas the cover of n vertices is a hiforest 
with n roots: 


chop 1 (H,v 1) = chop(H.vi) 

chop n (H,v 1, . .. ,v n ) = chop n _ 1 (chop(H,v 1 ),v 2 , • ■ • ,u n ) 
cover = cover(H,v 1) 

cover n (H, iq, ...,«„) = cover(H,v 1) U cover n - 1 (H,v 2 , ■ ■ ■ ,v n ) 

To avoid notational difficulties when dealing with more than one root si- 
multaneously, we define boxing and unboxing only for hiproofs. The definitions 
extend easily to hiforests by boxing reach root of the forest separately (although 
that is not needed in this paper). Note how the danglers in H are not included 
in the box introduced with box(l , H). 

Definition 10 (Boxing and Unboxing). Given a non-empty hiproof H = 
(V, L, <j, — > s , <) with overall root v r , i.e., isroot^ s u>i(v r ), then the boxing of 
H with a label l is defined as 


box{l, H) = < V U {*}, L U {* ^ (l, 7) | L(v r ) = (l', 7)}, 

<iU{(u,*) \ ve V, L(y) = (l, 7) A l / • }, — t a , < U{(*, *)}) 


The unboxing removes such a box (if it exists): let H = (V, L, <,, — » s , <), then 
we define 


V' = 


V~{r} 

V 


isroot^, s u >i (r), L(v) 
otherwise 


(1,7) A l / • 


Then: 

unbox (H) = {V' ,L\v,<\v^ s ,< |y») 

By careful inspection of the operation definitions we can show that the re- 
sulting hiforests indeed satisfy the conditions of Def. 1 and preserve semantic 
validity as stated earlier. 

Proposition 2 (Operations and validity). The semantic operations preserve 
the hiforest conditions and moreover, preserve semantic validity of hiproofs with 
the expected input-output goals. 

The final part of justifying our definitions is to show that the interpretation 
of updates is well-defined, when query results are given and refinement has the 
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71 ■■■In . , . . £ 

a is an atomic inference 

7 

a h 7 — ¥ [7i> • • • i7n] 

si b gi — > g s 2 b g — > 32 
si ; S2 b gi — > 92 


s b 7 ■ 


9 


id h 7 — > 7 
si I- Si — 


[l]s b 7 — > g 
g'l «2 b g 2 5- g2 


Si ® s 2 b 91 A g 2 — > g[ A 32 


Fig. 2. Validation of hiproof terms (the symbol A stands for list append). 


right shape. Specifically, refine X with s requires that when a(X) = v and the 
subtree at v has validity chop(H,v) j= g\ — > 32, then the term given denotes 
a hiforest with the same input-output shape. 

For this we need to show that syntactic hiproof terms denote valid tree struc- 
tures. This is shown together with the definition of [sj. Validity for syntactic 
hiproof terms is written as s b 31 — > g 2l meaning that the hiproof s takes a 
list of input (proven) goals g\ to produce a list of output (unproved) goals g 2 , 
and is defined by the rules in Fig. 2 . 


Definition 11 (Interpretation of hiproof terms). The definition of \s\ is by 

induction on the syntactic validity s h ji — > 32, defining [s] and establishing 
at the same time that [s] |= 31 — > g 2 . The cases are: 


a b 7 — > [71, . . . , 7„ ] . Then [a] is the n + 1 point hiforest with nodes 
a,x We set a^ s Xi, L(a) = (0,7) and each 37 is a “dangler”, so 
L(xj) = (», [7i]). 


id b 7 — y 7. 

Then [id] is the hiforest with one “dangler” node *, where 

£(*) = (•, [7]). 




[l]s b 7 — t 32. 

Then |[Z] s] = box(l, [s]) since [s] has a unique top-level 

root. 




Si ; S2 F 31 — 

-7 32 - 

Then [si ; S2] = graft(\s 1], [S2]). The premises of the 


validity rule and the induction hypothesis ensure that the grafting operation 
is well-defined. 

si ® S2 h 31 A 32 — > g[ A 32 • Then [si ® S2J is the hiforest formed by dis- 


joint union of [si] and [S2], with the ordering relation < extended on the 
roots and dangling nodes. 


{} b [] — > [] . [()] is the empty hiforest. 


Note that denotational hiproofs are unique only up to the choice of node set V ; 
two hiproofs which have the same structure and labelling but differ only on V 
are isomorphic [ 14 ]. The definitions above work with particular hiproofs, but it 
can be verified that the choice of node names (but not labels!) is unimportant. 


